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Abstract 

Bell's (1964) theorem is popularly supposed to establish the non-locality of quantum 
physics as a mathematical-physical theory. Building from this, observed violation 
of Bell's inequality in experiments such as that of Aspect and coworkers (1982) is 
popularly supposed to provide empirical proof of non-locality in the real world. This 
paper reviews recent work on Bell's theorem, linking it to issues in causality as 
understood by statisticians. The paper starts with a new proof of a strong (finite 
sample) version of Bell's theorem which relies only on elementary arithmetic and 
(counting) probability. This proof underscores the fact that Bell's theorem tells us 
that quantum theory is incompatible with the conjunction of three cherished and 
formerly uncontroversial physical principles, nicknamed here locality, realism, and 
freedom. The first, locality, is obviously connected to causality: causal influences 
need time to propagate spatially. Less obviously, the other two principles, realism 
and freedom, are also founded on two ideas central to modern statistical thinking on 
causality: counterfactual reasoning, and the distinction between do-ing X = x and 
selecting on X = x, respectively. I will argue that (accepting quantum theory) Bell's 
theorem should lead us to seriously consider relinquishing not locality, but realism, as 
a fundamental physical principle. The paper goes on to discuss statistical issues, in 
the interpretation of state-of-the-art Bell type experiments, related to post-selection 
in observational studies. Finally I state an open problem concerning the design of a 
quantum Randi challenge: a computer challenge to Bell-deniers. 



1 Introduction 

In this paper I want to discuss Bell's (1964) theorem from the point of view of causality 
as understood in statistics and probability. The paper complements and extends the work 
of Robins, VanderWeele, and Gill (2011), whose general aim is identical. 

Bell's theorem states that certain predictions of quantum mechanics are incompatible 
with the conjunction of three fundamental principles of classical physics which are some- 
times given the short names "realism" , "locality" and "freedom" . Corresponding real world 
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experiments, Bell experiments, are supposed to demonstrate that this incompatibility is a 
property not just of the the theory of quantum mechanics, but also of Nature itself. The 
consequence is that we are forced to reject one (or more) of these three principles. 

Both theorem and experiment hinge around an inequality constraining joint probability 
distributions (note plural!) of outcomes of measurements on spatially separated physical 
systems; an inequality which must hold if all three fundamental principles are true. In a 
nutshell, the inequality is an empirically verifiable consequence of the idea that the outcome 
of one measurement on one system cannot depend on which measurement is performed on 
the other. This principle, called locality or, less succinctly, relativistic local causality, is just 
one of the three principles. Its formulation refers to outcomes of measurements which are 
not actually performed, so we have to assume their existence, alongside of the outcomes 
of those actually performed: the principle of realism, or more precisely, counter factual 
definiteness. Finally we need to assume that we have complete freedom to choose which of 
several measurements to perform - this is the third principle, also called the no-conspiracy 
principle. 

We shall implement the freedom assumption as the assumption of statistical indepen- 
dence between the randomization in a randomized experimental design (i.e., the choice of 
experiment), and the outcomes of all the possible experiments combined. This includes the 
"counterfactual" outcomes of those experiments which were not actually performed, as well 
as the "factual" outcome of the experiment actually chosen. By existence of the outcomes 
of not actually performed experiments, we mean their mathematical existence within a 
mathematical physical theory of the phenomenon in question. The concepts of realism and 
locality together are often considered as one principle called local realism. Local realism is 
implied by the existence of local hidden variables, whether deterministic or stochastic. In a 
precise mathematical sense, the reverse implication is also true: local realism implies that 
we can construct a local hidden variable model for the phenomenon under study. How- 
ever one likes to think of this assumption (or pair of assumptions), the important thing 
to realize is that it is a completely unproblematic notion in classical physics; freedom (no 
conspiracy) even more so. 

To begin with I will establish a new version of the famous Bell inequality (more precisely: 
Bell-CHSH inequality) using very elementary logic, arithmetic and (discrete) probability. 
My version will not be an inequality involving (theoretical) expectation values of physical 
quantities, but it will be a probabilistic inequality involving experimentally observed av- 
erages. Moreover, the probabilistic component does not refer to random variation in the 
outcome of a given measurement on a physical system, but to the experimenter's freedom 
to choose which measurement to perform: i.e., to the randomness involved in implementing 
a randomized experimental design. Proving Bell's theorem in this way avoids reliance on 
abstract concepts (theoretical expectation values) and expresses the result directly in terms 
of observational data. The proof is new, complete, and completely elementary - it can be 
explained to a science journalist or to an intelligent teenager. It brings out the impor- 
tance of the "realism" and "freedom" assumptions alongside of the "locality" assumption, 
making clear that a violation of Bell's inequality implies that one or more of the three 
assumptions must fail, without determining which of the three is at fault. 



2 



In view of the experimental support for violation of Bell's theorem (despite shortcomings 
of experiments done to date, to be described later in the paper), the present writer prefers 
to imagine a world in which "realism" is not a fundamental principle of physics but only 
an emergent property in the familiar realm of daily life (including the world of applied 
statisticians). In this way we can have quantum mechanics and locality and freedom. He 
believes that within this position, the measurement problem (Schrodinger cat problem) 
has a decent mathematical solution. This position does entail taking quantum randomness 
very seriously: it becomes an irreducible feature of the physical world, a "primitive notion" ; 
it is not "merely" an emergent feature. It is moreover fundamentally connected with an 
equally fundamental arrow of time . . . and all this is the logical consequence of demanding 
that quantum physics respect temporal and spatial causality. 

2 Bell's inequality 

We will derive a version of Bell's inequality in three steps involving elementary logic, 
arithmetic and probability respectively. Throughout this section I use the word mean as 
shorthand for arithmetic mean; thus: "mean" , as in "x" . It is not meant to imply taking 
theoretical expectation values, but is simply a synonym for average. 

2.1 Logic 

Lemma 1 For any four numbers A, A' , B , B' each equal to ±1, 

AB + AB' + A'B - A'B' = ±2. (1) 

Proof of Lemma 1 Notice that 

AB + AB' + A'B - A'B' = A(B + B') + A'(B — B'). 

B and B' are either equal to one another or unequal. In the former case, B — B' = and 
B + B' = ±2; in the latter case B - B' = ±2 and B + B' = 0. Thus AB + AB' + A'B - A'B' 
equals ±2 times A or times A', which both equal ±1. Either way, AB + AB' + A'B- A'B' = 
±2. □ 

2.2 Arithmetic 

Consider a spreadsheet containing a AN x 4 table of numbers ±1. The rows will be labelled 
by an index j = 1, . . . , AN. The columns are labelled with names A, A', B and B' . I will 
denote the four numbers in the jth row of the table by Aj, A'-, Bj and B'-. Denote by 
(AB) = {\ /AN) Xlj=i AjBji the mean over the AN rows of the product of the elements in 
the A and B columns. Define (AB 1 ), (A'B), (A'B') similarly. 

Lemma 2 

(AB) + (AB') + (A'B) - (A'B') < 2. (2) 
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Proof of Lemma 2 By (1), 

(AB) + (AB') + (A'B) - (A'B') 

= {AB + AB' + A'B - A'B') e [-2,2] □ 

Formula (2) is "just" the CHSH inequality (Clauser, Home, Shimony and Holt, 1969), the 
basis of standard proofs of Bell's theorem, espoused by Bell in his later work, and the 
raison d'etre of many modern experiments confirming Bell's theorem; see next section. 

2.3 Probability 

Now suppose that for each row of the spreadsheet, two fair coins are tossed independently 
of one another, independently over all the rows. Suppose that depending on the outcomes 
of the two coins, we either observe A or A', and we either observe B or B'. We are therefore 
able to observe just one of the four products AB, AB', A'B, and A'B', each with equal 
probability |, for each row of the table. Denote by (AB) Q ^ S the mean of the observed 
products of A and B ("undefined" if the sample size is zero). Define {AB') ohs , (A'B) ohs 
and (A'B') ohs similarly. 

When N is large one would expect (AB) b s to be not much different from (AB), and the 
same for the other three means of observed products. Hence, equation (2) should remain 
approximately true when we replace the means of the four products over all 4iV rows with 
the means of the four products in each of four disjoint sub-samples of expected size N 
each. 

This intuitively obvious fact can be put into numbers through use of the Hoeffding 
(1963) inequality, a uniform exponential bound to the tails of the binomial distributions 
and hypergeometic distributions (i.e., sampling with and without replacement). The rest of 
this subsection will provide the details of a proof of such a probabilistic ("finite statistics") 
version of the Bell-CHSH inequality: 

Theorem 1 Given a AN x 4 spreadsheet of numbers ±1 with columns A, A' , B and B' , 
suppose that, completely at random, just one of A and A' is observed and just one of B 
and B' are observed in every row. Then, for any < r\ < 2, 

Pr( (AB) ohs + (AB') ohs + (A'B) ohs - (A'B') ohs < 2 + 77) 

> i_ 8e -^) 2 . (3) 

Traditional presentations of Bell's theorem focus on what could be called the large N limit 
of this result. If it is true that as iV — > 00, experimental averages converge to some kind 
of theoretical mean values, then these must satisfy 

(AB) lim + (AB') lim + (A'B) hm - (A'B% m < 2. (4) 

Like (2), this inequality is also called the CHSH inequality. There is actually quite a lot 
in between, as we have seen. 

The proof of (3) will use the following two results: 
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Fact 1 (Hoeffding's theorem, binomial case) Suppose X ~ Bin(n,p) and t > 0. 
Then 

Pr(X/n > p + t) < exp(-2nt 2 ). 

Fact 2 (Hoeffding's theorem, hypergeometric case) Suppose X is the number of red 
balls found in a sample without replacement of size n from a vase containing pM red balls 
and (1 — p)M blue balls and t > 0. Then 

Pr(X/n > p + t) < exp(-2nt 2 ). 

Hoeffding's paper gives other, even sharper results - variants of Bennet's inequality, for 
instance. I have gone for simplicity in the expression of the probability inequality (3), not 
for the sharpest result possible. 

Proof of Theorem 1 In each row of our 4N x 4 table of numbers ±1, the product AB 
equals ±1. For each row, with probability 1/4, the product is either observed or not 
observed. Let iV^g denote the number of rows in which both A and B are observed. Then 
Nf* ~ Bin(4iV, 1/4), and hence by Fact 1, for any 5 > 0, 

/ \ / yv obs \ 

Pr(jV°*<(l-4«5)lv) = Pr^ < \ - 5) < exp(-8iV5 2 ). 

Let N AB denote the total number of rows (i.e., out of 4N) for which AB = +1, define 
N^b similarly. Let N^ B + denote the number of rows such that AB = +1 among those 
selected for observation of A and B. Conditional on — n , ^ab' + * s distributed as 
the number of red balls in a sample without replacement of size n from a vase containing 
47V" balls of which N AB are red and N AB are blue. Therefore by Fact 2, conditional on 
Nab = n -> f° r an y e > 0, 

Pr *B > _^b_ + e < exp -2ne 2 . 
V Nf* - 4N J ~ FV ; 

We introduced above the notation (AB) for the mean of the product AB over the whole 
table; this can be rewritten as 

(AB) = N -b- N ab = 2 N*B_ lm 
V 1 AN AN 

Similarly, (AB) Q ^ S denoted the mean of the product AB just over the rows of the table for 
which both A and B are observed; this can be rewritten as 

j>robs,+ a 7" obs . Arobs,+ 

I AU\ _ AB - iV AB _ r> AB _ -, 
V^l-D/obs — A[ohs — ^ A[ohs 

ly AB AB 

Given 5 > and e > 0, all of N$%, Nf*,, N$% and N$%, are at least (1 - A5)N with 
probability at least 1 — 4 exp(— 8N5 2 ). On the event where this happens, the conditional 
probability that (AB) ^ S exceeds (AB) + 2e is bounded by 

exp(-2iV^ s e 2 ) < exp(-2JV(l - 45)e 2 ). 
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The same is true for the other three means (for the last one we first exchange the roles of 
+ and — to get a bound on (—A'B') ohs ). Combining everything, we get that 

(AB) ohs + (AB') ohs + (A'B) ohs - (A'B') ohs < 2 + 8e, 

except possibly on an event of probability at most 

p = 4exp(-8N5 2 ) +4exp(-2iV(l -45)e 2 ). 

Choosing 5 = e/2^/2 and restricting attention to 2(1 - 45) > 1, i.e., 5 < 1/8, e < l/2\/2, 
we can bound p by the simpler expression 

p < 8exp(-iVe 2 ). 

Replacing 8e by i] gives us (3), since rj < 2 implies e < 1/4 < 1/2^. □ 

3 Bell's Theorem 

Formulas (2) and (4) (the physics literature often does not notice the difference) are both 
commonly called Bell's inequality, or the Bell-CHSH inequality, or just the CHSH inequal- 
ity. The inequality goes back to Clauser, Home, Shimony and Holt (1969), and is a variant 
of a similar and earlier inequality of Bell (1964). Both original Bell inequality, and Bell- 
CHSH inequality, can be used to prove Bell's theorem: quantum mechanics is incompatible 
with the principles of realism, locality and freedom. In other words, if we want to hold on 
to all three principles, quantum mechanics must be rejected. Alternatively, if we want to 
hold on to quantum theory, we have to relinquish at least one of those three principles. 

The executive summary of the proof consists of the following remark: certain models 
in quantum physics predict 

(AB) hm + (AB% m + (A'B) hm - (A'B') hm = 2^2 > 2. (5) 

Moreover, as far as most authorities in physics are concerned, the prediction (5) has been 
amply confirmed by experiment: therefore, whatever we may think of quantum theory, at 
least one of locality, realism or freedom has to go. 

Almost no-one is prepared to abandon freedom. The consensus in recent years was that 
locality was at fault, but it seems to be shifting in recent years towards putting the blame 
on realism. In essence, this is "back to Bohr" (Copenhagen interpretation): not in the 
form of a dogma, a prohibition to speak of "what is actually going on behind the scenes" , 
but positively embraced: intrinsic quantum randomness is what happens! There is nothing 
behind the scenes! 

It is important to note that experiments do not exhibit a violation of (2): that would 
be a logical impossibility. However, an experiment could certainly in principle give strong 
evidence that (3) or (4) is false. We'll engage in some nit-picking about whether or not 
such experiments have already been done, later in the paper. 
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Fortunately, one can understand quite a lot of (5) without any understanding of quan- 
tum mechanics: we just need to know certain simple statistical predictions which follow 
from a particular special model in quantum physics called the EPR-B model. The initials 
refer here to the celebrated paradox of Einstein, Podolsky and Rosen (1935) in a version 
introduced by David Bohm (1951). 

Recall that quantum physics is a stochastic theory (the physists say: a statistical the- 
ory): it allows us to predict the probabilities of outcomes of measurements on quantum 
systems, not (in general) the actual outcomes. The EPR paradox and Bell's theorem are 
two landmarks in the history of the ongoing struggle of many physicists over the last cen- 
tury to find a more classical-like theory behind quantum theory: to explain the statistics 
by explaining what is actually happening behind the scenes, as it were. The purpose of 
this paper is to review that program from a statistician's point of view. 

The EPR-B model is a model which predicts the statistics of the measurement of spin 
on an entangled pair of spin-half quantum systems or particles in the singlet state. For 
our purposes it is not necessary to explain this terminology at all: all we need to know 
are the (statistical) predictions of the model, in the context of an informal description 
of the corresponding experiment. This informal description will use the word "particle" 
many times, but only to help the reader to visualize the experimental set-up. The concept 
of "particle" might or might not be part of a theoretical explanation of the experiment, 
depending on what kind of physics we believe is going on here; but what kind of physics it 
might be, is precisely the question we are investigating. 

In one run of the experiment, two particles are generated together at a source, and then 
travel to two distant locations. Here, they are measured by two experimenters Alice and 
Bob. Alice and Bob are each in possession of a measurement apparatus which can "measure 
the spin of a particle in any chosen direction" . Alice (and similarly, Bob) can freely choose 
(and set) a setting on her measurement apparatus. Alice's setting is an arbitrary direction 
in real three-dimensional space represented by a unit vector a, say. Her apparatus will then 
register an observed outcome ±1 which is called the observed "spin" of Alice's particle in 
direction a. At the same time, far away, Bob chooses a direction b and also gets to observe 
an outcome ±1, the observed "spin" of Bob's particle in direction b. This is repeated many 
times - i.e., the complete experiment will consist of many, say AN, runs. We'll imagine 
Alice and Bob repeatedly choosing new settings for each new run, in the same fashion as in 
Section 2: each tossing a fair coin to make a binary choice between two possible settings, 
a and a' for Alice, b and b' for Bob. First we will complete our description of the quantum 
mechanical predictions for each run separately. 

For pairs of particles generated in a particular quantum state, and with perfectly imple- 
mented measurement apparatus, the prediction of quantum mechanics is that in whatever 
directions Alice and Bob perform their measurements, their outcomes ±1 are perfectly ran- 
dom (i.e., equally likely +1 as —1). However, there is a correlation between the outcomes 
of the two measurements at the two locations, depending on the two settings: the expected 
value of the product of the outcomes is given by the inner-product —a - b = — cos(#) where 
9 is the angle between the two directions. 

In particular, if the two measurement directions are the same, the two outcomes will 
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always be opposite to one another. If the measurement directions are orthogonal, the two 
outcomes will be statistically independent of one another. If the two directions are opposite 
to one another, the two outcomes will always be equal. The just mentioned cosine rule 
is the smoothest possible way one can imagine that Nature could choose to interpolate 
between these three special cases, which in themselves do not seem surprising at all: one 
can easily invent a simple mechanistic theory which conforms to those predictions, in which 
the two particles "agree with one another" at the source, what their responses will be to 
any possible setting which they will separately encounter at the two measurement stations. 

With this information we can write down the complete 2x2 table for the probabilities 
of the outcomes 

+ + +- 
- + 

at the two locations, given two settings differing in direction by the angle 9: the four 
probabilities are 

l(l-cos(0)) i(l + cos(0)) 

±(l + cos(0)) ±(l-cos(0)). 

Both marginals of the table are uniform. The "correlation" , or expectation of the product, 
equals the probability the outcomes are equal minus the probability they are different, 
hence is equal to |(1 — cos(#)) — |(1 + cos(0)) = — cos(0). 

Consider now the following experimental set-up. Alice is allocated in advance two 
fixed directions a and a'; Bob is allocated in advance two fixed directions b and b'. The 
experiment is built up of 4N runs. In each run, Alice and Bob are each sent one of a new 
pair of particles in the singlet state. While their particles are en route to them, they each 
toss a fair coin in order to choose one of their two measurement directions. In total 47V" 
times, Alice observes either A = ±1 or A' = ±1 say, and Bob observes either B = ±1 
or B' = ±1. At the end of the experiment, four "correlations" are calculated; these are 
simply the four sample means of the products AB, AB', A'B and A'B' . Each correlation 
is based on a different subset, of expected size N runs, and determined by the 8N fair coin 
tosses. 

Under realism we can imagine, for each run, alongside of the outcomes of the actually 
measured pair of variables, also the outcomes of the not measured pair. Now, the outcomes 
in Alice's wing of the experiment might in principle depend on the choice of which variable 
is measured in Bob's wing, but under locality this is excluded. Thus, for each run there is 
a suite of potential outcomes A, A', B and B', but only one of A and A', and only one of B 
and B' actually gets to be observed. By freedom the choices are statistically independent 
of the actual values of the four. 

I'll assume furthermore that the suite of counterfactual outcomes in the jth run does 
not actually depend on which particular variables were observed in previous runs. This 
memoryless assumption, or Ansatz as physicists like to say, can be completely avoided by 
using the martingale version of Hoeffding's inequality, Gill (2003). But the present analysis 
is already applicable (i.e., without adding a fourth assumption) in we imagine 4iV copies 
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of the experiment each with only a single run, all being done simultaneously in widely 
separated locations. A Gedankenexperiment if ever there was one, but perfectly legitimate 
for present purposes. (Although: possibly cheaper than finding the Higgs boson, and in 
my opinion a whole lot more exciting). 

The assumptions of realism, locality and freedom have put us full square in the situation 
of the previous section. Therefore by Theorem 1, the four sample correlations (empirical 
raw product moments) satisfy (3). 

Let's contrast this prediction with the quantum mechanical predictions obtained with 
a certain clever selection of directions. We'll take the four vectors a, a', b and b' to 
lie in the same plane. It's then enough to specify the angles a,a',/3,l3' G [0, 2ir] which 
they make with respect to some fixed vector in this plane. Consider the choice a = 0, 
a' = 7r/2, (5 = 5ir/4, f3' = 3ir/4. The differences \a — /3\, \a — f3'\, \a' — f3\ are all equal to 
7r±7r/4: these pairs of angles are all "close to opposite" to one another; the corresponding 
measurements are strongly positively correlated. On the other hand, \a' — j3'\ = rc /4: the 
two angles are "close to equal" and the corresponding measurements are as strongly anti- 
correlated, as the other pairs were strongly correlated. Three of the correlations are equal 
to -cos(3tt/4) = -(-l/v 7 ^) = l/y/2 and the fourth is equal to -cos(tt/4) = -l/y/2. 
Thus we would expect to see, up to statistical variation (statistical variation in the coin 
toss outcomes!), 

(AB) ohs + (AB')ob s + (A'B) ohs - (A'B') ohs « 4/V2 = 2y/2 « 2.828 > 2, 

cf. (5). By Tsirelson's inequality (Csirel'son, 1980), this is actually the largest absolute 
deviation from the CHSH inequality which is allowed by quantum mechanics. 

Now many experiments have been performed confirming the predictions of quantum 
mechanics, beautifully. The most famous experiments to date are those performed by 
Aspect et al. (1982) in Orsay, Paris, and by Weihs et al. (1998) in Innsbruck. In these 
experiments, the choices of which direction to measure are not literally made with coin 
tosses performed by human beings, but by physical systems which attempted to imitate 
such processes as closely as possible. In both cases the separation between the locations 
of Alice and Bob is large; while the time it takes from initiating the choice of direction to 
measure to completion of the measurement (the time when an outcome ±1 is irrevocably 
committed to a computer data base) is small: so small, that Alice's measurement is com- 
plete before a signal traveling at the speed of light could possibly transmit Bob's choice 
to Alice's location. (This depends on what one considers to be the "time of initiation". 
Aspect's experiment can be thought to be less rigorous than Weihs' in this respect. More 
details later). 

The data gathered from the Innsbruck experiment is available online. It had 4N 
15 000; and found (AB) obs + (AB') ohs + (A'B) ohs - {A'B'} ohs = 2.73 ± 0.022, the sta- 
tistical accuracy (standard deviation) following from a standard delta-method calculation 
assuming i.i.d. observations per setting pair. The reader can check that this indeed corre- 
sponds to accuracy obtained by a standard computation using the binomial variances of 
the estimated probabilities of equal outcomes for each of the four subsamples. 
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By (3), under realism, locality and freedom, the chance that (AB) ohs + (AB') ohs + 
(A'B) obs - (A'B') ohs would exceed 2.73 is less than KT 12 . 

I will return to some important features of this data in a later section. The experiment 
deviates in several ways from what has been described so far, and I will just summarize 
them now. 

An unimportant detail is the physical system used: polarization of entangled photons 
rather than spin of entangled spin-half particles (e.g., electrons). The polarization of en- 
tangled photons, in the appropriate state, measured in the same direction, is equal; not 
opposite; and it is after a rotation of n/2, not n, that the polarization of such photons 
becomes "opposite". A polarization filter - e.g., the glass in a pair of polaroid sunglasses 
- distinguishes between "horizontally" and "vertically" polarized light (difference: 90 de- 
grees); a Stern-Gerlach device distinguishes between "spin up" and "spin down" (difference: 
180 degrees). Doubling or halving the angle, and reversal of one of the outcomes, takes us 
from one model to the other. 

An important and unjustly neglected difference between the idealization and the truth 
concerns the idea that in advance we decide to create 4N individual pairs of entangled 
particles. In the real world experiment with photons, there is no way to control when 
a pair of photons will leave the source. Even talking about "pairs of photons" is using 
classical physical language which can be acutely misleading. In actual fact, all we observe 
are individual detection events (time, current setting, outcome) at each of the two detec- 
tors, i.e., at each measurement apparatus. We do not observe, let alone control, times of 
emissions from the source! 

Also extremely important (but less neglected) is the fact that (in our naive particle 
picture) many particles fail to be detected at all. One could say that the outcome of 
measuring one particle is not binary but ternary: +, — , or no detection. 

In combination with the previous difficulty, if particles are not being transmitted at fixed 
times, then, if neither particle is detected we do not even know if there was a corresponding 
emission of a pair of particles. The data cannot be summarized in a list of pairs of settings 
and pairs of outcomes (whether binary of ternary), but consists of two lists of the random 
times of definite measurement outcomes in each wing of the experiment together with the 
settings in force at the time of the measurements (which are being rapidly switched, at 
random times). 

When detection events occur close together in time they are treated as belonging to a 
pair of photons, i.e., as belonging to the same "run" of the experiment. Using the language 
of "runs" and "photons" for convenience (i.e., without wishing to imply that "runs" and 
"photons" are objective concepts): not every photon which arrives at Alice's or Bob's 
detector actually gets measured at all. Only about one in twenty times that there is a 
measurement event in one wing of the experiment, is there also an event in the other wing, 
within such a short time interval that the pair can be considered as belonging together. 
Relative to a naive picture of pairs of particles leaving the source, individually some getting 
measured, some not, the observed statistics suggest that only one in twenty photons gets 
detected (and hence measured), only one in four hundred pairs of photons get measured 
together. 
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As just mentioned, the choice of measurement direction was not made "anew" for 
each new pair of particles. It was simply being constantly "chosen anew". Many times 
after a switch of measurement setting, no particle (apparently) arrives at all; there is no 
measurement outcome in that cycle. In the Orsay experiment measurement directions in 
the two wings of the experiment varied rapidly and cyclically, with periods whose ratio 
was not close to a ratio of small integers. Thus in fact, the switches between measurement 
settings were fixed in advance. Measurement times are effectively random, and widely 
spread in time relative to the period of the switches between measurement directions. In 
the Innsbruck experiment, the switches between measurement direction were made locally, 
at each measurement station, by a very rapid (quantum!) random number generator. 

We will return to the issue of whether the idealized picture AN pairs of particles, each 
separately being measured, each particle in just one of two ways, is really appropriate, in 
a later section. However, the point is that quantum mechanics does seem to promise that 
experiments of this nature could in principle be done, and if so, there seems no reason 
to doubt they could violate the CHSH inequality. Three correlations more or less equal 
to l/y/2 and one equal to —l/y/2 have been measured in the lab. Not to mention that 
the whole curve cos(#) has been experimentally recovered. What would this mean if the 
experiments had been perfect? What is the chance they'll ever be perfected? 

4 Realism, locality, freedom 

The EPR-B correlations have a second message, beyond the fact that they violate the CHSH 
inequality. They also exhibit perfect anti-correlation in the case that the two directions of 
measurement are exactly equal - and perfect correlation in the case that they are exactly 
opposite. This brings us straight to the EPR argument not for the non-locality of quantum 
mechanics, but for the incompleteness of quantum mechanics. 

Einstein, Podolsky and Rosen (1935) were revolted by the idea that the "last word" 
in physics would be a "merely" statistical theory. Physics should explain why, in each 
individual instance, what actually happens does happen. The belief that every "effect" 
must have a "cause" has driven Western science since Aristotle. Now according to the 
singlet correlations, if Alice were to measure the spin of her particle in direction a, it's 
certain that if Bob were to do the same, he would find exactly the opposite outcome. Since 
it is inconceivable that Alice's choice has any immediate influence on the particle over at 
Bob's place, it must be that the outcome of measuring Bob's particle in the direction a is 
predetermined "in the particle" as it were. The measurement outcomes from measuring 
spin in all conceivable directions on both particles must be predetermined properties of 
those particles. The observed correlation is merely caused by their origin at a common 
source. 

Thus Einstein uses locality and the predictions of quantum mechanics itself to infer 
realism, more properly called counter factual definiteness, the notion that the outcomes 
of measurements on physical systems are predefined properties of those systems, merely 
revealed by the act of measurement, to argue for the incompleteness of quantum mechanics 
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- it describes some aggregate properties of collectives of physical systems, but does not 
even deign to talk about physically definitely existing properties of individual systems. 

Whether it needed external support or not, the notion of counterfactual definiteness 
is nothing strange in all of physics (at least, prior to the invention of quantum mechan- 
ics). It belongs with a deterministic view of the world as a collection of objects blindly 
obeying definite rules. Note however that the CHSH proof of Bell's theorem does not start 
by inferring counterfactual definiteness from other properties. A wise move, since in ac- 
tual experiments, we would never observe exactly perfect correlation (or anti-correlation). 
And even if we have observed it one thousand times, this does not prove that the "true 
correlation" is +1; it only proves, statistically, that it is very close to +1. 

Be that as it may, Bell's theorem uses three assumptions to derive the CHSH inequality, 
and the first is counterfactual definiteness. Only after we agree that, even if only, say, A' 
and B were actually measured in one particular run, that A and B' also exist at least in 
some mathematical sense alongside of the two other, does it make sense to discuss locality: 
the assumption that which variable is being observed at Alice's location does not influence 
the values taken by the other two at Bob's location. To go further still, only after we 
have assumed both counterfactual definiteness and locality, does it make sense to assume 
freedom: the assumption that we can freely choose to observe either A or A', and either B 
or B'. 

Some writers here like to associate the freedom assumption with the free will of the 
experimenter, others with the existence of "true" randomness in other physical processes. 
Thus one metaphysical assumption is justified by another. I would rather focus on prac- 
tical experience and on Occam's razor. We understand fairly well, mathematically and 
physically, how small uncontrollable variations in the initial conditions of the toss of a coin 
lead, for all practical purposes, to completely random binary outcomes "head" or "tail" . In 
a quite similar way, we know how the arbitrary choice of seed of a pseudo random number 
generator can lead to a sequence of binary digits which for all practical purposes behaves 
like outcomes of a sequence of coin tosses. Now imagine that both Alice and Bob choose 
the detector settings according either to a physical random number generator such as a 
human being tossing a coin, or a pseudo random number generator. Let's accept coun- 
terfactual definiteness and locality. Do we really believe that the observed correlations 
l/y/2, l/y/2, l/y/2, —l/y/2 occur through some as yet unknown physical mechanism by 
which the outcomes of Alice's random generators were exquisitely tuned to the measure- 
ment outcomes of Bob's photons? Sure, if the universe is completely deterministic, then 
everything which goes on today was already settled at the time of the big bang, including 
which measurement settings were going to be fed into which photo-detectors. But if you 
really want to believe this, how come we never see a bigger violation of CHSH than the 
2y/2 which we observe in the Aspect experiment, and how come we never see any evidence 
for dependence between coin tosses and outcomes of distant photo-detectors except in the 
specific scenario of a Bell-CHSH type experiment? 

The idea that we can save local realism by adopting "super-determinism" has not been 
taken seriously by many physicists, except perhaps Gerhard 't Hooft, who argues that 
at the Planck scale we do not have freedom to choose measurement settings. Indeed, so 
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far, we do no experiments at all at that scale. This is taken by 't Hooft as legitimization 
for trying to build a local realistic theory of the physical world at an "underlying scale" 
with the idea that the statistical model of quantum mechanics would "emerge" at a higher 
level of description. But it's ludicrous to me that super-determinism at the Planck scale 
could delicately constrain the statistical dependence between coin tosses and distant photo- 
detector clicks. 

But this means we have to make a choice between two other inconceivable possibilities: 
do we reject locality, or do we reject realism? 

Here I would like to call on Occam's principle again. Suppose realism is true. Despite 
the fact that the two particles, if measured in the same way, would exhibit equal and oppo- 
site spins, it can't be the case that those spins were somehow embedded in the particles at 
the source. If we believe the predictions of quantum mechanics for the EPR-B experiment, 
we have to imagine that the act of measuring one of the particles in a particular way had 
an instant effect far away. We don't have any theory for how that effect happens. Well - 
there is a theory called Bohmian mechanics which does create a mathematical framework 
in which this is exactly what does happen. It is a mechanistic description of what goes on 
"under the surface" which exactly reproduces the statistical predictions of quantum the- 
ory, but is it an explanation? It has further defects: it is not relativistically invariant, and 
can't be; even though what happens "on the surface" does have this property. It requires 
an preferred space-time reference frame. Since it "merely" reproduces the predictions of 
quantum mechanics, which we have anyway, we don't actually need it, though as a math- 
ematical device it can provide clever tricks for afficionados for solving some problems. So 
far, it has not caught on. 

It seems to me that we are pretty much forced into rejecting realism, which, please 
remember, is actually a highly idealistic concept. I hasten to add that I am not alone 
in this, and could easily cite a number of very authoritative voices in modern quantum 
physics (I will mention just one name now: Nicolas Gisin; another name later). However, 
I admit it somehow goes against all instinct. In the case of equal settings, how can it be 
that the outcomes are equal and opposite, if they were not predetermined at the source? I 
freely admit, there is simply no way to imagine otherwise. 

Possibly lamely I would like here to appeal to the limitations of our own brains, the 
limitations we experience in our "understanding" of physics due to our own rather special 
position in the universe. According to recent research in neuroscience our brains are already 
at birth hardwired with various basic conceptions about the world. These "modules" 
are nowadays called systems of core knowledge. The idea is that we cannot acquire new 
knowledge from our sensory experiences (including learning from experiments: we cry, and 
food and/or comfort is provided!) without having a prior framework in which to interpret 
the data of experience and experiment. It seems that we have modules for elementary 
algebra and for analysis: basic notions of number and of space. But we also have modules 
for causality. We distinguish between objects and agents (we learn that we ourselves are 
agents). Objects are acted on by agents. Objects have continuous existence in space time, 
they are local. Agents can act on objects, also at a distance. Together this seems to me to 
be a built-in assumption of determinism; we have been created (by evolution) to operate 
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in an Aristotelian world, a world in which every effect has a cause. 

The argument (from physics and from Occam's razor, not from neuroscience!) for aband- 
ing realism is made eloquently by Boris Tsirelson in an internet encyclopedia article on en- 
tanglement It was Tsirelson from whom I borrowed the terms counterfactual definiteness, 
relativistic local causality, and no-conspiracy He points out that it's a mathematical fact 
that quantum physics is consistent with relativistic local causality and with no-conspiracy. 
In all of physics, there is no evidence against either of these two principles. This is for him 
a good argument to reject counterfactual definiteness. 

I would like to close this section by just referring to a beautiful paper by Masanes, Acin 
and Gisin (2006) who argue in a very general setting (i.e., not assuming quantum theory, 
or local realism, or anything) that quantum non-locality, by which they mean the violation 
of Bell inequalities, together with non- signalling, which is the property that the marginal 
probability distribution seen by Alice of A does not depend on whether Bob measures B 
of B' , together implies indeterminism: that is to say: that the world is stochastic, not 
deterministic. 

5 Resolution of the Measurement Problem 

The measurement problem, also known as Schrodinger's cat problem) is the problem to 
reconcile two apparently mutually contradictory parts of quantum mechanics. When a 
quantum system is isolated from the rest of the world, its quantum state (a vector, nor- 
malized to have unit length, in Hilbert space) evolves unitarily, deterministically. When we 
look at a quantum system from outside, by making a measurement on it in a laboratory, 
the state collapses to one of the eigenvectors of an operator corresponding to the particular 
measurement, and it does so with probabilities equal to the squared lengths of the pro- 
jections of the original state vector into the eigenspaces. Yet the system being measured 
together with the measurement apparatus used to probe it form together a much larger 
quantum system, supposedly evolving unitarily and deterministically in time. 

For practical purposes, physicists know how to model which parts of their experiments 
in which way, so as to get results which are confirmed by experiment, and many are 
not concerned with the measurement problem. However, cosmologists wanting to build a 
physical model for the evolution of the whole universe based on quantum physics, have 
a problem. The universe is not just a wave function in an enormous Hilbert space. As 
we experience it, it consists of more or less definite objects following more or less definite 
trajectories in real space-time. 

Accepting that quantum theory is intrinsically stochastic, and accepting the reality 
of measurement outcomes, has led Slava Belavkin (2007) to a mathematical framework 
which he calls eventum mechanics which (in my opinion) indeed reconciles the two faces of 
quantum physics (Schrodinger evolution, von Neumann collapse) by a most simple device. 
Moreover, it is based on ideas of causality with respect to time. I have attempted to 
explain this model in as simple terms as possible in Gill (2009). The following words will 

1 http : //en . citizendium. org/wiki/Entanglement_/ 28physics°/ 29 
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only make sense to those with some familiarity with quantum mechanics though at very 
elementary level only. The idea is to model the world, as is conventional from the quantum 
theory point of view, with a Hilbert space, a state on that space, and a unitary evolution. 
Inside this framework we look for a collection of bounded operators on the Hilbert space 
which all commute with one another, and which are causally compatible with the unitary 
evolution of the space, in the sense that they all commute with past copies of themselves 
(in the Heisenberg picture, one thinks of the quantum observables as changing, the state 
as fixed; each observable corresponds to a time indexed family of bounded operators). We 
call this special family of operators the beables: they correspond to physical properties in 
a classical- like world which can coexist, all having definite values at the same time, and 
definite values in the past too. The state and the unitary evolution together determine 
a joint probability distribution of these time-indexed variables, i.e., a stochastic process. 
At any fixed time we can condition the state of the system on the past trajectories of the 
beables. This leads to a quantum state over all bounded operators which commute with 
all the beables. 

The result is a theory in which the deterministic and stochastic parts of traditional 
quantum theory are combined into one completely harmonious whole. In fact, the notion 
of restricting attention to a subclass of all observables goes back a long way in quantum 
theory under the name supers election rule; and abstract quantum theory (and quantum 
field theory) has long worked with arbitary algebras of observables, not necessarily the full 
algebra of a specific Hilbert space. With respect to those traditional approaches the only 
novelty is to suppose that the unitary evolution when restricted to the sub-algebra is not 
invertible. It is an endomorphism, not an isomorphism. There is an arrow of time. 

Quantum randomness is just time, the quantum future meeting the classical past in 
the present. 

6 Loopholes 

In real world experiments, the ideal experimental protocol of particles leaving a source 
at definite times, and being measured at distant locations according to locally randomly 
chosen settings cannot be implemented. 

Experiments have been done with pairs of entangled ions, separated only by a short 
distance. The measurement of each ion takes a relatively long time, but at least it is almost 
always successful. Such experiments are obviously blemished by the so-called communi- 
cation or locality loophole. Each particle can know very well how the other one is being 
measured. 

Many very impressive experiments have been performed with pairs of entangled pho- 
tons. Here, the measurement of each photon can be performed very rapidly and at huge 
distance from one another. However, many photons fail to be detected at all. For many 
events in one wing of the experiment, there is often no event at all in the other wing, even 
though the physicists are pretty sure that almost all detection events do correspond to 
(members of) entangled pairs of photons. This is called the detection loophole. Popularly 
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it is thought to be merely connected to the efficiency of photo-detectors and that it will be 
easily overcome by the development of better and better photodetectors. Certainly that is 
necessary, but not sufficient, as I'll explain. 

In Weihs' experiment mentioned earlier, only 1 in 20 of the events in each wing of the 
experiment is paired with an event in the other wing. Thus of every 400 pairs of photons - 
if we may use such terminology, and if we assume that detection and non-detection occur 
independently of one another in the two wings of the experiment - only 1 pair results in a 
successful measurement of both the photons; there are 19 further unpaired events in each 
wing of the experiment; and there were 361 pairs of photons not observed at all. 

Imagine (anthropocentrically) classical particles about to leave the source and aiming 
to fake the singlet correlations. If they are allowed to go undetected often enough, they can 
engineer any correlations they like, as follows. Consider two new photons about to leave 
the source. They agree between one another with what pair of settings they would like to 
be measured. Having decided on the desired setting pair, they next generate outcomes ±1 
by drawing them from the joint probability distribution of outcomes given settings, which 
they want the experimenter to see. Only then do they each travel to their corresponding 
detector. There, each particle compares the setting it had chosen in advance with the 
setting chosen by Alice or Bob. If they are not the same, it decides to go undetected. 

With probability 1/4 we will have successful detections in both wings of the experi- 
ment. For those detections, the pair of settings according to which the particles are being 
measured is identical to the pair of settings they had aimed at in advance. 

This may seem silly, but it does illustrate that if one wants to experimentally prove a 
violation of local realism without making an untestable assumption of missing at random, 
one has to put limits on the amount of "non-detections". 

Jan-Ake Larsson (1998, 1999) has proved variants of the CHSH inequality which take 
account of the possibility of non-detections. The idea is that under local realism, as the 
proportion of "missing" measurements increases from zero, the upper bound "2" in the 
CHSH inequality (4) increases too. We introduce a quantity 7 called the efficiency of the 
experiment: this it the minimum over all setting pairs of the probability that Alice sees 
an outcome given Bob sees an outcome (and vice versa). It is not to be confused with 
"detector efficiency". The (sharp) bound on (AB) Um + (AB % m + (A'B) nm - (A'B'} hm set 
by local realism is no longer 2 as in (4), but 2 + 5, where 5 = £(7) = 4(7 _1 — 1) . 

In particular, as long as 7 > l/y/2 ~ 0.7071, the bound 2 + 5 is smaller than 2\/2. 
Weihs' experiment has an efficiency of 5%. If only we could increase it to above 71% and 
simultaneously get the state and measurements even closer to perfection, we could have 
definitive experimental proof of Bell's theorem. 

This would be correct for a "clocked" experiment. Suppose now particles determine 
themselves the times that they are measured. Thus a local realist pair of particles trying 
to fake the singlet correlations could arrange between themselves that their measurement 
times are delayed by smaller or greater amounts depending on whether the setting they 
see at the detector is the setting they want to see, or not. It turns out that this gives our 
devious particles even more scope for faking correlations. Gill and Larsson (2004) showed 
the sharp bound on (AB) lim + (AB') lim + (A'B) hm — (A'B') iim set by local realism is 2 + 5, 
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where now 5 = £(7) = 6(7^ - 1). As long as 7 > 3(1 -l/y/2) « 0.8787, the bound 2 + 5 is 
smaller than 2a/2. We need to get experimental efficiency above 88%, and have everything 
else perfect, at the very limits allowed by quantum physics. We have a long way to go. 

7 Bell's theorem without inequalities 

In recent years new proofs of Bell's theorem have been invented which appear to avoid prob- 
ability or statistics altogether, such as the famous GHZ (Greenberger, Home, Zeilinger) 
proof. Experiments have already been done implementing the set-up of these proofs, and 
physicists have claimed that these experiments prove quantum-nonlocality by the outcomes 
of a finite number of runs: no statistics, no inequalities (yet their papers do exhibit error 
bars!). 

Such a proof runs along the following lines. Suppose local realism is true. Suppose also 
that some event A is certain. Suppose that it then follows from local realism that another 
event B has probability zero, while under quantum mechanics it can be arranged that the 
same event B has probability one. Paradoxical, but not a contradiction in terms: the catch 
is that events A and B are events under different experimental conditions: it is only under 
local realism and freedom that the events A and B can be situated in the same sample 
space. Moreover, freedom is needed to equate the probabilities of observable events with 
those of unobservable events. 

As an example, consider the following scenario, generalizing the Bell-CHSH scenario to 
the situation where the outcome of the measurements on the two particles is not binary, 
but an arbitrary real number. This situation has been studied by Zohren and Gill (2006), 
Zohren, Reska, Gill and Westra (2010). 

Just as before, settings are chosen at random in the two wings of the experiment. Under 
local realism we can introduce variables A, A', B and B' representing the outcomes (real 
numbers) in one run of the experiment, both of the actually observed variables, and of 
those not observed. 

It turns out that it is possible under quantum mechanics to arrange that Pr{B' < A} = 
Pr{A < B} = Pr{B < A'} — 1 while Pr{B' < A'} = 0. On the other hand, under local 
realism, Pr{B' < A} = Pr{A < B} = Pr{B < A'} — 1 implies Pr{B' < A'} = 1. 

Note that the four probability measures under which, under quantum mechanics, Pr{A < 
B}, Pi{A > B'}, Pi{A' > B}, Pi{A' > B'} are defined, refer to four different experimental 
set-ups, according to which of the four pairs (A, B) etc. we are measuring. 

The experiment to verify these quantum mechanical predictions has not yet been per- 
formed though some colleagues are interested. Interestingly, though it requires a quantum 
entangled state, that state should not be the maximally entangled state. Maximal "quan- 
tum non- locality" is quite different from maximal entanglement. And this is not an isolated 
example of the phenomenon. 

Note that even if the experiment is repeated a large number of times, it can never 
prove that probabilities like Pr{A < B} are exactly equal to 1. It can only give strong 
statistical evidence, at best, that the probability in question is very close to 1 indeed. 
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But actually experiments are never perfect and more likely is that after a number of 
repetitions, one discovers that {A > B} actually has positive probability - that event will 
happen a few times. Thus though the proof of Bell's theorem that quantum mechanics is 
in conflict with local realism appears to have nothing to do with probability and only to 
do with logic, in fact, as soon as we try to convert this into experimental proof that Nature 
is incompatible with local realism, we will be in the business of proving (statistically) 
violation of inequalities, again (as the next section will make clear). 

Finally in this section, I have to admit to having lied in my statement "it turns out to 
be possible". Actually, Zohren et al. (2010) only showed that one can arrange that those 
probabilities can be arbitrarily close to 1 and 0: it is not clear that the limiting situation 
still corresponds to states and measurements belonging to the usual framework of quantum 
mechanics. However I believe that this "lie for children" does not affect the moral of the 
story. 

8 Better Bell inequalities 

Why all the attention to the CHSH inequality? There are others around, aren't there? And 
are there alternatives to "inequalities" altogether? Well, in a sense the CHSH inequality is 
the only Bell inequality worth mentioning in the scenario of two parties, two measurements 
per party, two outcomes per measurement. Let's generalize this scenario and consider p 
parties, each choosing between one of q measurements, where each measurement has r 
possible outcomes (further generalizations are possible to unbalanced experiments, multi- 
stage experiments, and so on). I want to explain why CHSH plays a very central role in the 
2x2x2 case, and why in general, generalized Bell inequalities are all there is when studying 
the p x q x r case. The short answer is: these inequalities are the bounding hyperplanes of 
a convex polytope of "everything allowed by local realism". The vertices of the polytope 
are deterministic local realistic models. An arbitrary local realist model is a mixture of 
the models corresponding to the vertices. Such a mixture is a hidden variables model, the 
hidden variable being the particular random vertex chosen by the mixing distribution in a 
specific instance. 

From quantum mechanics, after we have fixed a joint p-partite quantum state, and 
sets of q r-valued measurements per party, we will be able to write down probability 
tables p(a, b, ...\x, y, ...) where the variables x, y, etc. take values in 1, . . . , q, and label the 
measurement used by the first, second, ...party. The variables a, b, etc., take values in 
1, . . . ,r and label the possible outcomes of the measurements. Altogether, there are q p r p 
"elementary probabilities" in this list (indexed set) of tables. More generally, any specific 
instance of a theory, whether local- realist, quantum mechanical, or beyond, generates such 
a list of probability tables, and defines thereby a point in dimensional Euclidean space. 

We can therefore envisage the sets of all local-realist models, all quantum models, and 
so on, as subsets of q p r p - dimensional Euclidean space. Now, whatever the theory, for any 
values of x, y, etc., the sum of the probabilities p(a, b, . . . \x, y, . . . ) must equal 1. These are 
called normalization constraints. Moreover, whatever the theory, all probabilities must be 
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nonnegative: positivity constraints. Quantum mechanics satisfies locality (with respect to 
what it talks about!), which means that the marginal distribution of the outcome of any one 
of the measurements of any one of the parties does not depend on which measurements 
are performed by any of the other parties. Since marginalization corresponds again to 
summation of probabilities, these so-called no-signaling constraints are expressed by linear 
equalities in the elements in the probability tables corresponding to a specific model. Not 
surprisingly, local-realist models also satisfy the no-signaling constraints. 

We will call a list of probability tables restricted only by positivity, normalization and 
no-signalling, but otherwise completely arbitrary, a local model. The positivity constraints 
are linear inequalities which place us in the positive orthant of Euclidean space. Normal- 
ization and no-signalling are linear equalities which place us in a certain affine subspace 
of Euclidean space. Intersection of orthant and affine subspace creates a convex polytope: 
the set of all local models. We want to study the sets of local-realist models, of quantum 
models, and of local models. We already know that local-realist and quantum are contained 
in local. It turns out that these sets are successively larger, and strictly so: quantum in- 
cludes all local-realist and more (that's Bell's theorem); local includes all quantum and 
more (that's Tsirelson's inequality combined with an example of a local, i.e., no-signalling 
model which violates Tsirelson's inequality). 

Let's investigate the local-realist models in more detail. A special class of local- realist 
models are the local- deterministic models. A local-deterministic model is a model in which 
all of the probabilities p(a, b, . . . \x, y, . . . ) equal or 1 and the no-signalling constraints are 
all satisfied. In words, such a theory means the following: for each possible measurement 
by each party, the outcome is prescribed, independently of what measurements are made 
by the other parties. Now, it is easy to see that any local-realist model corresponds to 
a probability mixture of local-deterministic models. After all, it "is" a joint probability 
distribution of simultaneous outcomes of each possible measurement on each system, and 
thus it "is" a probability mixture of degenerate distributions: fix the random element lu, 
and each outcome of each possible measurement of each party is fixed; we recover their 
joint distribution by picking u at random. 

This makes the set of local-realist models a convex polytope: all mixtures of a finite 
set of extreme points. Therefore it can also be described as the intersection of a finite 
collection of half-spaces, each half-space corresponding to a boundary hyperplane. 

It can also be shown that the set of quantum models is closed and convex, but its 
boundary is very difficult to describe. 

Let's think of these three models from "within" the affine subspace of no-signalling and 
normalization. Relative to this subspace, the local models form a full (non-empty interior) 
closed convex polytope. The quantum models form a strictly smaller closed, convex, full 
set. The local-realist models form a strictly smaller still, closed, convex, full polytope. 

Slowly we have arrived at a rather simple picture. Imagine a square, with a circle 
inscribed in it, and with another smaller square inscribed within the circle. The outer 
square represents the boundary of the set of all local models. The circle is the boundary 
of the convex set of all quantum models. The square inscribed within the circle is the 
boundary of the set of all local-realist models. The picture is oversimplified. For instance, 
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the vertices of the local-realist polytope are also extreme points of the quantum body and 
vertices of the local polytope. 

A generalized Bell inequality is simply a boundary hyperplane, or face, of the local- 
realist polytope, relative to the normalization and no-signalling affine subspace, but ex- 
cluding boundaries corresponding to the positivity constraints. I will call these interesting 
boundary hyperplanes "non-trivial" . In the 2x2x2 case, for which the affine subspace 
where all the action lies is 8 dimensional, the local-realist polytope has exactly 8 non- 
trivial boundary hyperplanes. They correspond exactly to all possible CHSH inequalities 
(obtained by permuting outcomes, measurements and parties). Thus in the 2x2x2 case, 
the Bell-CHSH inequality is indeed "all there is". 

When we increase p, q or r, new Bell inequalities turn up, and moreover keep turning up 
( "new" means not obtainable from "old" by omitting parties or measurements or grouping 
outcomes). It seems a hopeless (and probably pointless) exercise to try to classify them. 
Incidentally, it's an open question as to whether every generalized Bell inequality, as defined 
here, is violated by quantum mechanics. 

Quite a few generalized Bell inequalities have turned out to be of particular interest, 
for instance, the work of Zohren and Gill concerned the 2 x 2 x r case and discussed a class 
of inequalities, one for each r, whose asymptotic properties could be studied as r increased 
to infinity. 

9 Bell's fifth position 

There is an interesting and neglected gap in the proof of Bell's theorem, which has been 
pointed out only by a few authors, among them the present writer Gill (2003). Quantum 
theory allows for the existence of entangled states, but does this imply their existence 
in nature? In particular, is it possible to create entangled states "to order" of several 
spatially distant and individually spatio-temporarily localized, particles? This is what a 
succesfull loophole-free Bell experiment requires. The experiments on entangled photons 
are bedevilled by the fact that as the distance between the measurement stations increases, 
the efficiency of the set-up decreases. Recall: in a Bell-type experiment, "detection effi- 
ciency" should be defined as the minimum over all setting combinations, of the probability 
of having a detection in one wing of the experiment, given a detection in the other (both 
for Alice given Bob, and for Bob given Alice). It is not just a property of photodectors but 
of the combined paths from source to detectors). Some of the loss of individual photons 
takes place already in the emission phase (typically inside a crystal excited by a laser) 
during which photons are propagating in three-dimensional space, not linearly. 

Succesful experiments have been done of ions (the binary property being excited/not 
excited) separated at tiny distance from one another in an ion trap. These experiments 
have close to 100% efficiency but the measurement of the energy level of the ions takes 
an enormous length of time (relatively speaking) to complete. So far it has proved very 
difficult to create entanglement between massive objects at macroscopic distance from one 
another. In view of the duration of their measurement, much larger distances are required 
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than for experiments on photons. 

The detection and measurement of a photon is extremely rapid. The quantum physics of 
the propagation of entangled photons tells us that the two photons will be in the required 
entangled state at their respective detectors, given that they both arrive there, within 
the right time window. Increasing the separation between the detectors increases quantum 
uncertainty in the time they will arrive at their detectors, as well as increasing their chance 
(partly quantum mechanical in origin) of not being detected at all. Could it be that 
quantum physics itself could prevent a loophole-free and succesful Bell-type experiment 
from ever be being performed? 

10 Quantum Randi challenges 

A second reason for the specific form of the proof of Bell's theorem which started this pa- 
per, is that it lends itself well to design of computer challenges. Every year, new researchers 
publish, or try to publish, papers in which they claim that Bell made some fundamental 
errors, and in which they put forward a specific local realist model which allegedly repro- 
duces the quantum correlations. The papers are long and complicated; the author finds it 
hard to get the work published, and suspects a conspiracy by The Establishment. 

Now it is unlikely that someone will ever come up with a disproof of a famous and now 
50 years old theorem, especially, as I hope I have made clear, the mathematical essence of 
that theorem is pretty trivial. A disproof of Bell's theorem is as likely as a proof that the 
square root of 2 is not an irrational number. 

I have found it useful in debates with "Bell-deniers" to challenge them to implement 
their local realist model as computer programs for a network of classical computers, con- 
nected so as to mimic the time and space separations of the Bell-CHSH experiments. A 
somewhat similar challenge has been independently proposed by Sascha Vongehij^l), who 
gave his challenge the name "quantum Randi challenge" , inspired by the well known chal- 
lenge of James Randi (scientific sceptic and fighter against pseudo-science^!) concerning 
paranormal claims. Vongehr's challenge differs in a number of significant respects from 
mine, for various good reasons. My challenge is not a quantum Randi challenge in Vongehr's 
sense (and he coined the phrase). Some differences will be mentioned in a moment. 

The protocol of the challenge I have issued in the past is the following. Bell-denier is to 
write computer programs for three of his own personal computers, which are to play the 
roles of source S, measurement station A, and measurement station B. The following is 
to be repeated say 15 000 times. First, S sends messages to A and B. Next, connections 
between A, B and S are severed. Next, from the outside world so to speak, I deliver the 
results of two coin tosses (performed by myself), separately of course, as input setting to 
A and to B. Heads or tails correspond to a request for A or A' at A, and for B or B' at 
B. The two measurement stations A and B now each output an outcome ±1. Settings and 

^http : //www . science20 . com/alphajneme/of f icial_quantumjrandi_challenge-80168 
3 http : //en . wikipedia . org/wiki/ JamesJtandi 
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outcomes are collected for later data analysis, Bell-denier's computers are re-connected; 
next run. 

Bell-denier's computers can contain huge tables of random numbers, shared between 
the three, and of course they can use pseudo-random number number generators of any 
kind. By sharing the pseudo-random keys in advance, they have resources to any amount 
of shared randomness they like. 

In Gill (2003) I showed how a martingale Hoeffding inequality could be used to gener- 
alize the exponential bound (3) to the situation just described. This enabled me to choose 
47V, and a criterion for win/lose (say, halfway between 2 and 2\/2), and a guarantee to 
Bell-denier (at least so many runs with each combination of settings), such that I would 
happily bet 3000 Euros any day that the Bell-denier's computers will fail the challenge. 

The point (for me) is not to win money for myself, but to enable the Bell-denier 
who considers accepting the challenge (a personal challenge between the two of us, with 
adjudicators to enforce the protocol) to discover for him or herself that "it cannot be done" . 
It's important that the adjudicators do not need to look inside the programs written by the 
Bell-denier, and preferably don't even need to look inside his computers. They are black 
boxes. The only thing that has to be enforced are the communication rules. However, 
there are difficulties here. What if Bell-denier's computers are using a wireless network 
which the adjudicators can't detect? 

Sascha Vongehr has proposed a somewhat simpler protocol, for which he believes that 
a total of 800 runs are enough. In his challenge, the quantum Randi challenge so to speak, 
Bell-denier has to write programs which will run on any decent computers. The computers 
will be communicating by internet and the Bell-denier's programs, not his computers, have 
to beat Bell repeatedly when other persons run Bell-denier's programs, again and again. 
The point is that if the Bell-denier posts his programs on internet and people test them 
and they work (and if they work, people will test them!), The Internet will spread the 
news far and wide - there is no way The Establishment can prevent the news coming out. 
Vongehr requires that the Bell-denier's computer programs will succeed "most times" at 
beating the bound 2; while I required a single success at exceeding a larger bound (though 
smaller than 2\/2). 

Let me return to my "one-on-one" challenge. We'll have to trust one another enough 
that Bell-denier does only use, say, a specific wireless network, which the adjudicators can 
switch on and off, for communication between his three computers. Still, the necessity 
to many times connect and disconnect the three computers in synchrony with delivery of 
settings and broadcasting of outcomes causes logistic nightmares, and also raises timing 
problems. The length of time of the computation on A could be used to signal to B if 
we wait for both computers to be ready before they are connected to the outside world to 
output their results. So the network has to be switched on and off at regular time intervals 
synchronized with delivery of settings and broadcasting of outcomes. 

For this and for many other reasons, it would be extremely convenient to allow bulk 
processing. My bulk processing protocol runs as follows: S sends messages to both A and 
B corresponding to say 4N entangled pairs of particles, all in one go. (This actually makes 
the S redundant: Bell-denier just needs to clone two copies of it as a virtual computer 
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inside A and B). Next, I deliver AN coin tosses each to A and B as settings. Finally, after 
a pre- agreed length of time, A and B each deliver AN outcomes ±1. 

Challenge to the reader: prove a useful inequality like (3) for this situation, or prove 
that it cannot be done. 

What is a useful inequality; what is the problem here, anyway? A useful inequality should 
have an error probability which, as that in (3), becomes arbitrarily small as N increases. 
The problem is that through having access to all settings simultaneously, the Bell-denier 
is able to create dependence between outcomes of different runs. It's clear that this can be 
used to increase the dispersion of the outcomes, though the mean values are not affected. 
How far can the dispersion be increased? 

So far I only succeeded in obtaining results using the boundedness of the summands, 
a Bell-inequality bound on their mean, and Markov's inequality. The resulting error prob- 
ability does not depend on N at all, so it's useless for a one-on-one, one shot challenge. 
It can be used for a mixed bulk/sequential challenge, for instance, 100 batches of size 800 
(or as much as Bell-denier feels comfortable with) each, in which the Bell-denier should 
achieve a high score in a large proportion of the batches. That's something; but can we do 
better? 
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(?) at which I challenged the statistical causality community to come up with an account of 
Bell-CHSH type experiments. In particular, any decent theory of causality should answer 
the question: does Alice's choice of measurement have a causal effect on Bob's outcome? 

At the time no answer was forthcoming. I now consider that the answer is yes, if you 
accept counterfactual definiteness, no if not. I consider counterfactual definiteness not an 
"optional" philosophical or metaphysical position, but rather a physical property, which 
might hold true in some realms but not others. In the quantum realm I believe there are 
very good physical grounds to reject counterfactual definiteness. Moreover, doing so leads 
to resolution of many otherwise problematic aspects of quantum theory. 
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